Video compression using adaptive selection of groups of frames, adaptive bit allocation, and adaptive replenishment

ABSTRACT

The present invention provides video signal compression that efficiently groups pictures in a video stream into variably-sized groups of pictures (GOPs), thereby providing lower achievable output signal bit rates and higher output signal quality. The video signal compression maximizes the output signal quality by appropriately allocating bits among individual pictures and GOPs in the output signal. The video signal compression of the present invention also applies compression methods that reduce noise in the output signal, by utilizing a macroblock-based tunable conditional replenishment technique. The conditional replenishment technique exploits the similarities among images in the variably-sized GOPs to further minimize output bit rate and maximize the output signal quality. An analysis-by-synthesis method is also provided to select a best asynchronous sampling method among various generated candidate output streams.

PRIORITY AND RELATED APPLICATIONS

[0001] The present application claims priority to provisional patentapplication entitled, “Video Processing Method with General and SpecificApplications,” filed on Jul. 11, 2000 and assigned U.S. application Ser.No. 60/217,301. The present application is also related tonon-provisional application entitled, “Adaptive Edge Detection andEnhancement for Image Processing,” (attorney docket number 07816-105003)filed on Jul. 11, 2001 and assigned U.S. application Ser. No. ______;and non-provisional application entitled, and non-provisionalapplication entitled, “System and Method for Calculating an OptimumDisplay Size for a Visual Object,” (attorney docket number 07816-105002)filed on Jul. 11, 2001 and assigned U.S. application Ser. No. ______.

FIELD OF THE INVENTION

[0002] The present invention relates to the processing of a video streamand more specifically relates to the improvement of video streamcompression by adaptively selecting a group of pictures based on videostream content, by adaptively allocating bits to generate a compressedvideo stream, and by adaptively replenishing macroblocks.

BACKGROUND OF THE INVENTION

[0003] Recent advancements in communication technologies have enabledthe widespread distribution of data over communication mediums such asthe Internet and broadband cable systems. This increased capability haslead to increased demand for the distribution of a diverse range ofcontent over these communication mediums. Whereas early uses of theInternet were often limited to the distribution of raw data, more recentadvances include the distribution of HTML-based graphics and audiofiles.

[0004] More recent efforts have been made to distribute video media overthese communication mediums. However, because of the large amount ofdata needed to represent a video presentation, the data is typicallycompressed prior to distribution. Data compression is a well-known meansfor conserving transmission resources when transmitting large amounts ofdata or conserving storage resources when storing large amounts of data.In short, data compression involves minimizing or reducing the size of adata signal (e.g., a data file) in order to yield a more compact digitalrepresentation of that data signal. Because digital representations ofaudio and video data signals tend to be very large, data compression isvirtually a necessary step in the process of widespread distribution ofdigital representations of audio and video signals.

[0005] Fortunately, video signals are typically well suited for standarddata compression techniques. Most video signals include significant dataredundancy. Within a single video frame (image), there typically existssignificant correlation among adjacent portions of the frame, referredto as spatial correlation. Similarly, adjacent video frames tend toinclude significant correlation between corresponding image portions,referred to as temporal correlation. Moreover, there is typically aconsiderable amount of data in an uncompressed video signal that isirrelevant. That is, the presence or absence of that data will notperceivably affect the quality of the output video signal. Because videosignals often include large amounts of such redundant and irrelevantdata, video signals are typically compressed prior to transmission andthen decompressed again after transmission.

[0006] Generally, the distribution of a video signal includes atransmission unit and a receiving unit. The transmission unit willreceive a video signal as input and will compress the video signal andtransmit the signal to the receiving unit. Compression of a video signalis usually performed by an encoder. The encoder typically reduces thedata rate of the input video signal to a level that is predetermined bythe capacity of the transmission medium. For example, for a typicalvideo file transfer, the required data rate can be reduced from about 30Megabits per second to about 384 kilobits per second. The compressionratio is defined as the ratio between the size of the input video signaland the size of the compressed video signal. If the transmission mediumis capable of a high transmission rate, then a lower compression rationcan be used. On the other hand, if the transmission medium is capable ofa relatively low transmission rate, then a lower compression ratio canbe used.

[0007] After the receiving unit receives the compressed video signal,the signal must be decompressed before it can be adequately displayed.The decompression process is performed by a decoder. In someapplications, the decoder is used to decompress the compressed videosignal so that it is identical to the original input video signal. Thisis referred to as lossless compression, because no data is lost in thecompression and decompression processes. The majority of encoding anddecoding applications, however, use lossy compression, wherein somepredefined amount of the original data is irretrievably lost in thecompression and expansion process. In order to decompress the videostream to its original (pre-encoding) data size, the lost data must bereplaced by new data. Unfortunately, lossy compression of video signalswill almost always result in the degradation of the output video signalwhen displayed after decoding, because the new data is usually notidentical to the lost original data. Video signal degradation typicallymanifests itself as a perceivable flaw in a displayed video image. Theseflaws are typically referred to as noise. Well-known kinds of videonoise include blockiness, mosquito noise, salt-and-pepper noise, andfuzzy edges. The data rate (or bit rate) often determines the quality ofthe decoded video stream. A video stream that was encoded with a highbit rate is generally a higher quality video stream than one encoded ata lower bit rate.

[0008] Conventional methods of compressing video signals include thepartitioning of the video signal into groups of pictures. Unfortunately,conventional compression techniques utilize inefficient and arbitrarilysimple methods of grouping pictures that result in higher output signalbit rates and/or lower output signal quality. Moreover, because theseconventional techniques use arbitrarily simple picture groupings, theydo not provide the opportunity to maximize the output signal quality byappropriately allocating bits among pictures and picture groups in theoutput signal. Finally, these compression techniques typically applycompression methods that result in the propagation and amplification ofnoise, especially in background potions of a video picture.

[0009] Therefore, there is a need in the art for video signalcompression that efficiently groups pictures in a video stream andprovides for lower output signal bit rates and higher output signalquality. The video signal compression also should maximize the outputsignal quality by appropriately allocating bits among pictures andpicture groups in the output signal. In addition, the video signalcompression also should apply compression methods that reduce noise inthe output signal. Finally, the method should enable the use of varioussampling techniques and should enable the selection of an output stream,based on the sampling technique providing the best video stream.

SUMMARY OF THE INVENTION

[0010] The present invention provides video signal compression thatefficiently groups pictures in a video stream into variably-sized groupsof pictures (GOPs) thereby providing lower achievable output signal bitrates and higher output signal quality. The video signal compressionmaximizes the output signal quality by appropriately allocating bitsamong pictures and picture groups in the output signal. An adaptivemethod of bit allocation among picture groups and within the pictures inthose picture groups enables the efficient allocation of bits, accordingto the relative sizes of the picture groups. The video signalcompression of the present invention also applies compression methodsthat reduce noise in the output signal, by utilizing a macroblock-basedtunable conditional replenishment technique. The conditionalreplenishment technique exploits the similarities among images in thevariably-sized GOPs to further minimize output bit rate and maximize theoutput signal quality. An analysis-by-synthesis method is also providedto select a best asynchronous sampling method among candidate samplingprocedures.

[0011] In one aspect of the invention, a method is provided forprocessing an input video stream comprising a series of pictures. Afirst scene change is detected between a first scene in the input videostream and a second scene in the input video stream. The methodclassifies the first picture following the first scene change as anintra-picture (I-picture).

[0012] In another aspect of the invention, the input stream processingmethod determines whether there are a predetermined number of picturesbetween the first I-picture and a second scene change. A second picturein the input video stream is classified as a second I-picture, where itis determined that the predetermined number of pictures exist betweenthe first intra-picture and the second scene change, wherein the secondpicture coincides with the predetermined number of pictures.

[0013] In yet another aspect of the invention, a system is provided fororganizing a series of pictures in an input video stream into at leastone group of pictures (GOP). The system includes a picture groupingmodule for detecting a scene change in the series of pictures and forclassifying a first picture following the scene change as a firstintra-picture (I-picture). The picture grouping module also can classifyat least one other picture following the scene change as a predictedpicture (P-picture) and can classify at least one second picture as abi-directionally predicted picture (B-picture). The system also includesa bit allocation module for determining whether a first GOP uses lessthan a predetermined target number of bits and further operative toallocate an unneeded bit to a second GOP in response to a determinationthat the first GOP uses less than the predetermined target number ofbits.

[0014] The various aspects of the present invention may be more clearlyunderstood and appreciated from a review of the following detaileddescription of the disclosed embodiments and by reference to thedrawings and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[0015]FIG. 1 is a block diagram depicting an exemplary video streamcomprised of a series of video pictures.

[0016]FIG. 2 is a flowchart depicting an exemplary method for coding,transmitting, and decoding a video stream.

[0017]FIG. 3 is a block diagram depicting a system for encoding a videostream that is an exemplary embodiment of the present invention.

[0018]FIG. 4 depicts a conventional decoding system for receiving anencoded video stream and providing decoded video and audio output.

[0019]FIG. 5 is a block diagram depicting an exemplary selection ofpicture encoding modes in a GOP.

[0020]FIG. 6 is a block diagram depicting an exemplary timelinecomparing the occurrence of scene changes in a video stream withalternative GOP size formats.

[0021]FIG. 7 is a flowchart depicting an exemplary method for creatingGOPs of varying sizes.

[0022]FIG. 8 is a graph depicting a typical relationship between thebits generated by a conventional compression method and a conventionalgroup of pictures.

[0023]FIG. 9 is a series of block diagrams and graphs comparing thegenerated bit graph of a conventional compression method with agenerated bit graph of an exemplary embodiment of the present invention.

[0024]FIG. 10a is a flow chart depicting an exemplary method foradaptively allocating bits among variable-sized groups of pictures.

[0025]FIG. 10b is a flow chart depicting an exemplary method foradaptively allocating bits among pictures within a GOP.

[0026]FIG. 11 is a simplified illustration depicting successive picturesin an exemplary GOP divided into macroblocks.

[0027]FIG. 12 is a flowchart depicting an exemplary method forperforming conditional replenishment on a macroblock-basis.

[0028]FIG. 13 is a flowchart depicting an exemplary method forgenerating and selecting between two sampling methods.

DETAILED DESCRIPTION OF THE EXEMPLARY EMBODIMENTS

[0029] The present invention provides video signal compression thatefficiently groups pictures in a video stream into variably-sized groupsof pictures (GOPs) thereby providing lower achievable output signal bitrates and higher output signal quality. The video signal compressionmaximizes the output signal quality by appropriately allocating bitsamong pictures and picture groups in the output signal. An adaptivemethod of bit allocation among picture groups and within the pictures inthose picture groups enables the efficient allocation of bits, accordingto the relative sizes of the picture groups. The video signalcompression of the present invention also applies compression methodsthat reduce noise in the output signal, by utilizing a macroblock-basedtunable conditional replenishment technique. The conditionalreplenishment technique exploits the similarities among images in thevariably-sized GOPs to further minimize output bit rate and maximize theoutput signal quality. An analysis-by-synthesis method is also providedto select a best asynchronous sampling method among multiple non-uniformand/or uniform sampling procedures.

[0030] An Exemplary Operating Environment

[0031]FIG. 1 is a block diagram depicting an exemplary video streamcomprised of a series of video pictures. A video stream is simply acollection of related images that have been connected in a series tocreate the perception that objects in the image series are moving.Because of the large number of separate images that are required toproduce a video stream, it is common that the series of images will bedigitized and compressed, so that the entire video stream requires lessspace for transmission or storage. The process of compressing such adigitized video stream is often referred to as “encoding.” Among otherthings, encoding a video stream typically involves removing theirrelevant and/or redundant digital data from the digitized videostream. Once the video stream has been so compressed, a video streammust usually be decompressed before it can be properly rendered ordisplayed.

[0032] The video stream 100 depicted in FIG. 1 includes six, separateimages or pictures 102-112. Typically, a video stream is displayed to aviewer at about 30 frames per second. Therefore, the video stream 100depicted in FIG. 1 would provide about 0.2 seconds of playback at thetypical display rate.

[0033] Generally, there is little noticeable change from one picture inthe series to the next. If a video stream were to be stored ortransmitted without compression, large amounts of redundant data wouldbe stored because of the significant video data overlap from one frameto the next. For video stream storage, the storage of such redundantdata is consumptive of memory resources. For video stream transmission,the transmission of such redundant data significantly increasestransmission time and may be impossible at certain data transmissionrates.

[0034] Video stream compression is one means for reducing the size of avideo stream. In short, video stream compression involves theelimination of irrelevant and/or redundant video data from the videostream. Moreover, many compression methods store only enough video dataon a frame-by-frame basis to represent the differences between one frameto the next. For example, many compression methods store anintra-picture (I-Picture) that includes all or most of the video datafor a particular frame/picture in a video stream. Subsequent picturescan be represented by predicted pictures (P-pictures) or bybi-directionally predicted pictures (B-pictures). P-pictures are encodedusing motion-compensated prediction from a previous I-Picture or aprevious P-Picture. B-pictures are encoded using motion-compensationprediction from either previous or subsequent I-pictures or P-pictures.B-pictures are not used in the prediction of other B-pictures or otherP-pictures. Accordingly, I-pictures require the most amount of videodata and can be compressed the least. P-pictures require less video datathan I-pictures and can be significantly compressed. B-pictures requirethe least amount of video data and can be compressed the most.

[0035] In the example of FIG. 1, the first picture 102 is an I-Picture.Accordingly, much of the video data of the image of the first picture102 would be used to represent the first picture 102. The second picture104 may be a B-Picture and, thus, may be represented in terms of videodata differences with the I-Picture 102. Because the B-Picture 104 isbi-directionally predicted, it may also be presented in terms ofdifferences with the P-Picture 106. The P-Picture 106, in turn, ispredicted in terms of differences with the I-Picture 102. The P-Picture106 is not represented in terms of differences with the B-Picture 104.

[0036] Differences between video pictures are often predicted based oncalculated motion vectors. Motion vectors are well-known mathematicalrepresentations of the movement and/or expected movement of visual“objects” in a series of pictures in a video stream. In order to trackand predict the motion of objects, pictures are divided into pictureelements (pels). Pels may be a video pixel or some other definabledivision of a picture. In any event, object motion can be tracked byreference to corresponding pels in a series of related video pictures.

[0037] Often, a video picture (or other digitized picture) is encoded asa collection of blocks 116. Each block is typically an 8-by-8-square ofpels. In addition, video pictures also are commonly divided intomacroblocks that usually contain 6 blocks (4 blocks for luminance and 2blocks for chrominance signal). Those skilled in the art will appreciatethat the division of video pictures into blocks and macroblocks isarbitrary, but helpful to the creation of video compression standards.Moreover, the division of pictures into such blocks enables therepresentation of P-pictures and B-pictures in terms of other picturesin the video stream. This block/macroblock-based representationfacilitates picture comparisons, based on corresponding portions ofsuccessive pictures. As described above, this representation furtherfacilitates the compression of a video stream.

[0038]FIG. 2 is a flowchart depicting an exemplary method for coding,transmitting, and decoding a video stream. One application for which thedescribed exemplary embodiment of the present invention is particularlysuited is that of video stream processing. Because of the large numberof separate images that are required to produce a video stream, it iscommon that the series of images will be digitized and compressed(encoded), so that the entire video stream requires less space fortransmission or storage. Once the video stream has been so compressed,the video stream must usually be decompressed before it can be properlydisplayed. The flow chart of FIG. 2 depicts the steps that are generallyfollowed to encode, decode, and display a video stream.

[0039] The method of FIG. 2 begins at start block 200 and proceeds tostep 202. At step 202, the input video stream is prepared for encoding.Step 202 may be performed by an encoder or prior to sending the videostream to an encoder. In any event, the video stream can be modified tofacilitate encoding. Indeed various exemplary embodiments of the presentinvention are directed to various aspects of performing this step. Thefollowing Figures and accompanying text are drawn to describing thoseembodiments.

[0040] The method proceeds from step 202 to step 204. At step 204, theinput video stream is encoded. As described, the encoding processinvolves, among other things, the compression of the digitized datamaking up the input video stream. For the purposes of this description,the terms “encoding” and “compression” are used interchangeably. Oncethe video stream has been encoded, it can be transmitted or stored inits compressed form. At step 206, the encoded video bit stream istransmitted. Often this transmission can be made over conventionalbroadcast infrastructure, but could also be over broadband communicationresources and/or internet-based communication resources.

[0041] The method proceeds from step 206 to step 208. At step 208, thereceived, encoded video stream is stored. As described above, thecompressed video stream is significantly smaller than the input videostream. Accordingly, the storage of the received, encoded video streamrequires fewer memory resources than storage of the input video streamwould require. This storage step may be performed, for example, by acomputer receiving the encoded video stream over the Internet. Thoseskilled in the art will appreciate that step 208 could be performed avariety of well-known means and could be even be eliminated from themethod depicted in FIG. 2. For example, in a real-time streaming videoapplication, the video stream is typically not stored prior to display.

[0042] The method proceeds from step 208 to step 210. At step 210, thevideo stream is decoded. Decoding a video stream includes, among otherthings, expanding (decompressing) the encoded video stream to itsoriginal data size. That is, the encoded video stream is expanded sothat it is the same size as the input video stream. The irrelevantand/or redundant video data that was removed in the encoding process isreplaced with new data. Various, well-known algorithms are available fordecoding an encoded video stream. Unfortunately, these algorithms aretypically unable to return the encoded video stream to its original formwithout some image degradation. Consequently, a decoded video stream istypically filtered by a post-processing filter to reduce flaws (e.g.,noise) in the decoded video stream.

[0043] Once the video stream has been decoded, it is suitable fordisplaying. The method of FIG. 2 proceeds from step 210 to step 212 andthe enhanced video stream is displayed. The method then proceeds to endblock 214 and terminates.

[0044] An Exemplary Encoding System

[0045]FIG. 3 is a block diagram depicting a system for encoding a videostream that is an exemplary embodiment of the present invention. Theencoding system 300 receives a video input signal 302 and an audio inputsignal 304. The video input 302 is typically a series of digitizedimages that are linked together in series. The audio input 304 is simplythe audio signal that is associated with the series of images making upthe video input 302.

[0046] The video input 302 is first passed through a pre-processingfilter 306 that, among other things, filters noise from the video input302 to prepare the input video stream for encoding. The input videostream is then passed to the video encoder 310. The video encodercompresses the video signal by eliminating irrelevant and/or redundantdata from the input video signal. The video encoder 310 may reduce theinput video signal to a predetermined size to match the transmissionrequirements of the encoding system 300. Alternatively, the videoencoder 310 may simply be configured to minimize the size of the encodedvideo signal. This configuration might be used, for example, to maximizethe storage capacity of a storage medium (e.g., hard drive).

[0047] In a similar fashion, the audio input 304 is compressed by theaudio encoder 308. The encoded audio signal is then passed with theencoded video signal to the video stream multiplexer 312. The videostream multiplexer 312 combines the encoded audio signal and the encodedvideo signal so that the signals can be separated and played-backsubstantially simultaneously. After the encoded video and encoded audiosignals have been combined, the encoding system outputs the combinedsignal as an encoded video stream 314. The encoded video stream 314 isthus prepared for transmission, storage, or other processing as neededby a particular application. Often, the encoded video stream 314 will betransmitted to a decoding system that will decode the encoding videostream 314 and prepare it for subsequent display.

[0048] In an exemplary embodiment of the present invention, the videoinput stream 302 can be further processed prior to encoding. In additionto the pre-processing performed by the pre-processing filter 306, theexemplary encoding system 300 can prepare the input video stream 302 forencoding by generating a control signal for the input video stream tofacilitate compression. For example, a rate controller 320 can be usedto match the output bit rate of the encoder to the capacity oftransmission channel or storage device. Furthermore, The rate controller320 can be used to control the output video quality. For efficient ratecontrol, the exemplary encoding system 300 includes a picture groupingmodule 316, a bit allocation module 318 and a bit rate controller 320.

[0049] The picture grouping module 316 can process a video input streamby selecting and classifying I-pictures in the video stream. The picturegrouping module 316 can also select and classify P-pictures in the videostream. As is discussed in more detail below, the picture groupingmodule 316 can significantly improve the quality of the encoded videostream. Conventional encoding systems arbitrarily select I-pictures, byadhering to fixed-size picture groups. The exemplary coding system 300can adaptively select I-pictures to maximize the encoded video streamquality.

[0050] The bit allocation module 318 can be used to enhance the qualityof the encoded video bit stream by adaptively allocating bits among thegroups of pictures defined by the picture grouping module 316 and byallocating bits among the pictures within a given group of pictures.Whereas conventional decoding systems often allocate bits in anarbitrary manner, the allocation module 318 can reallocate bits from thepicture groups requiring less video data to picture groups requiringmore video data. Consequently, the quality of the encoded video bitstream is enhanced by improving the quality of the groups of picturesrequiring more video data for high quality representation.

[0051] The bit rate controller 320 uses an improved method ofconditional replenishment to further reduce the presence of noise in anencoded video bit stream. Conditional replenishment is a well-knownaspect of video data compression. In conventional encoding systems, apicture element or a picture block will be encoded in a particularpicture if the picture element or block has changed when compared to aprevious picture. Where the picture element or block has not changed,the encoder will typically set a flag or send an instruction to thedecoder to simply replenish the picture element or block with thecorresponding picture element or block from the previous picture. Thebit rate controller 320 of an exemplary embodiment of the presentinvention instead focuses on macroblocks and may condition thereplenishment of a macroblock on the change of one or more pictureelements and/or blocks within the macroblock. Alternatively, the bitrate controller 320 may condition the replenishment of a macroblock on aquantification of the change within the macroblock (e.g., the averagechange of each block) meeting a certain threshold requirement. In anyevent, the objective of the bit rate controller 320 is to further reducethe presence of noise in video data and to simplify the encoding of avideo stream.

[0052] A Conventional Decoding System

[0053]FIG. 4 depicts a conventional decoding system for receiving anencoded video stream and providing decoded video and audio output. Thedecoding system 400 receives an encoded video stream 402 as input to avideo stream demultiplexer 404. The video stream demultiplexer separatesthe encoded video signal and the encoded audio signal from the encodedvideo stream 402. The encoded video signal is passed from the videostream demultiplexer 404 to the video decoder 406. Similarly, theencoded audio signal is passed from the video stream demultiplexer 404to the audio decoder 410. The video decoder 406 and a audio decoder 410expand the video signal and the audio signal to a size that issubstantially identical to the size of the video input and audio inputdescribed above in connection with FIG. 3. Those skilled in the art willappreciate that various well-known algorithms and processes exist fordecoding an encoded video and/or audio signal. It will also beappreciated that most encoding and decoding processes are lossy, in thatsome of the data in the original input signal is lost. Accordingly, thevideo decoder 406 will reconstruct the video signal with some signaldegradation, which is often perceivable as flaws in the output image.

[0054] The post-processing filter 408 is used to counteract noise foundin a decoded video signal that has been encoded and/or decoded using alossy process. Examples of well-known noise types include mosquitonoise, salt-and-pepper noise, and blockiness. The conventionalpost-processing filter 408 includes well-known algorithms to detect andcounteract these and other known noise problems. The post-processingfilter 408 generates a filtered, decoded video output 412. Similarly,the audio decoder 410 generates a decoded audio output 414. The videooutput 412 and the audio output 414 may be fed to appropriate ports on adisplay device, such as a television, or may be provided to some otherdisplay means such as a software-based media playback component on acomputer. Alternatively, the video output 412 and the audio output 414may be stored for subsequent display.

[0055] As described above, the video decoder 406 decompresses or expandsthe encoded video signal 402. While there are various well-known methodsfor encoding and decoding a video signal, in all of the methods, thedecoder must be able to interpret the encoded signal. The typicaldecoder is able to interpret the encoded signal received from anencoder, as long as the encoded signal conforms to an accepted videosignal encoding standard, such as the well-known MPEG-1 and MPEG-2standards. In addition to raw video data, the encoder typically encodesinstructions to the decoder as to how the raw video data should beinterpreted and represented (i.e., displayed). For example, an encodedvideo stream may include instructions that a subsequent video picture isidentical to a previous picture in a video stream. In this case, theencoded video stream can be further compressed, because the encoder neednot send any raw video data for the subsequent video picture. When thedecoder receives the instruction, the decoder will simply represent thesubsequent picture using the same raw video data provided for theprevious picture. Those skilled in the art will appreciate that suchinstructions can be provided in a variety of ways, including setting aflag or bit within a data stream.

[0056]FIG. 5 is a block diagram depicting an exemplary selection ofpicture encoding modes in a GOP. As described above in connection withFIG. 1, the video stream can be described in terms of I-pictures 503,B-pictures 504, and P-pictures 506. A video stream can be represented bya series of groups of pictures (GOPs). Each GOP begins with an I-Pictureand includes one or more P-pictures and/or B-pictures. As describedabove, the I-Picture requires the most video data and is representedwithout reference to any other picture in the video stream. TheP-Picture 506 can be represented in terms of differences with theI-Picture 502. Likewise, the B-Picture 504 can be represented in termsof differences with the I-Picture 502 and/or the P-Picture 506. Inconventional encoding methods, the size of the GOP 508 is arbitrarilyset to a specific number of pictures. Consequently, during the encodingprocess, the first picture is classified as the I-Picture and isfollowed by a collection of P-pictures and B-pictures. When thepredetermined number of pictures have been collected into a GOP, a newGOP can be started. The new GOP is started by identifying a next pictureas an I-Picture.

[0057] In an exemplary embodiment of the present invention, the size ofeach GOP may be variable. In one embodiment, I-Frames coincide withscene changes in the input video stream. As is well known in the art, ascene change can be detected by significant changes and/or structuralbreakdown of motion vectors from one picture to the next. Once a scenechange has been detected, the picture following the scene change (i.e.,first picture of the new scene) may be classified as an I-Picture.

[0058]FIG. 6 is a block diagram depicting an exemplary timelinecomparing the occurrence of scene changes in a video stream withalternative GOP size formats. The video stream 600 is represented as aseries of four scenes. Scene changes occur at times 608, 610, and 612.In a conventional encoding system, the GOP is set at a constant numberof frames, as depicted by GOP series 604. Notably, the I-Frames in GOPformat 604 occur at times 616, 618, 620, and 622. None of these timescorrespond with the times of the scene changes in the video stream 600.

[0059] The variable GOP format 602 is an exemplary embodiment of thepresent invention. Typically, the I-Frames of the variable GOP formatcoincide with the scene changes in the video stream 600. However, wherea scene is sufficiently long, the variable GOP format 602 will defaultto a constant GOP size and insert an I-Picture as needed, as shown attime 606. Consequently, some GOPs of the variable GOP format 602 will belonger than the typical size of constant GOP format 604. Other GOPs ofthe variable GOP format 602 (e.g., GOP 614) will be significantly longerthan the typical size of the constant GOP format 604.

[0060] A major objective of the variable GOP format 602 of an exemplaryembodiment of the present invention is to coincide I-pictures and scenechanges. Because both I-pictures and scene changes require the mostamount of video data storage, the coincidence of these frames reducesthe amount of data required to represent and encoded video stream.Another major objective of the variable GOP format 602 of an exemplaryembodiment of the present invention is to maximize the benefit of noveladaptive bit allocation and conditional replenishment methods that aredescribed in more detail in connection with FIGS. 8-12.

[0061] An Exemplary Method for Generating Variably-sized Groups ofPictures

[0062]FIG. 7 is a flowchart depicting an exemplary method for creatingGOPs of varying sizes. The method begins at start block 700 and proceedsto step 702. At step 702, the first GOP is created and a first picturefrom an input video stream is retrieved. The method proceeds to step704, wherein the first picture is classified as the I-Picture and isadded to the first GOP.

[0063] The method proceeds from step 704 to decision block 706. Atdecision block 706, a determination is made as to whether more picturesexist in the input video stream. If a determination is made that morepictures exist in the video stream, the method branches to step 710. If,on the other hand, a determination is made that no more pictures existin the video stream, the method branches to end block 708 andterminates.

[0064] At step 710, the next picture from the video stream is retrieved.The method then proceeds to decision block 712. At decision block 712, adetermination is made as to whether the predefined GOP picture limit hasbeen reached. As described above in connection with FIG. 6, in the casewhere a scene is longer than the predefined GOP size, the method willcreated a new GOP rather than allow the variable GOP to reach anindefinite size. If the predefined GOP picture limit has been reached,the method branches to step 716 and a new GOP is started. If, on theother hand, the standard GOP picture limit has not been reached, themethod branches to decision block 714.

[0065] At decision block 714, a determination is made as to whether ascene change has been reached in the video stream. As described above, ascene change can be detected by various well-known means. If a scenechange has been detected, the method branches to step 716 and new GOP isstarted. If, on the other hand, a scene change has not been reached, themethod branches to step 718 and the retrieved picture is added to thecurrent GOP. The method proceeds from step 718 to decision block 706 andproceeds as described above.

[0066] Accordingly, pictures from an input video stream are added to aGOP until either a scene change occurs or the predefined GOP size isreached. Exemplary GOP sizes range from a minimum of 15 frames to amaximum 60 frames. Those skilled in the art will appreciate that GOPs ofwidely varying sizes could be used within the scope of the presentinvention. As described above, the objective of the exemplary method isto coincide scene changes and I-Frames so as to minimize the numberI-Frames and scene change frames stored in an encoded video stream.

[0067]FIG. 8 is a graph depicting a typical relationship between thebits generated by a conventional compression method and a conventionalgroup of pictures. The graph 800 is divided into three groups ofpictures (GOPs) 802, 804, 806. Each GOP 802, 804, 806 begins with anI-picture 808, 810, 812. As described above, most conventionalcompression methods remove irrelevant, redundant, and/or expendable bitsfrom a video stream. This is done by removing as much video data aspossible from each picture in an input video stream. In addition,conventional compression methods encode pictures such that the contentof the encoded pictures can be predicted from previous and/or subsequentpictures and the encoded video stream. Accordingly, much of the videodata for such predictable pictures can be eliminated from the encodedvideo stream, thereby further reducing the size of (i.e., furthercompressing) the encoded video stream. I-pictures 808, 810, 812,however, are used to predict the video data content of other pictures(e.g., B-pictures, P-pictures) and typically contain more video datathan other pictures in an encoded video stream.

[0068] Referring again to FIG. 8, it is apparent that for the I-pictures808, 810, 812 more bits are generated during the compression processthan for non-I-pictures 814, 816, 818. As described above, conventionalcompression methods select pictures in an input video stream asI-pictures in an arbitrary fashion, based primarily on the number ofpictures in a particular GOP. In an exemplary embodiment of the presentinvention, I-pictures 808, 810, 812 can be selected to coincide withscene changes. Typically, scene-change pictures and I-pictures requirethe compression process to generate more bits than for non-scene changepictures or for non-I-pictures. By classifying scene-change pictures asI-pictures, an exemplary embodiment of the present invention reduces theoverall number of bits generated by the compression process. Because alarge number of bits must be stored with an I-picture, regardless of thepicture content, classifying scene-change pictures as I-pictures simplycapitalizes on this feature to reduce the overall number of bitsgenerated by the compression process.

[0069]FIG. 9 is a series of block diagrams and graphs comparing thegenerated bit graph of a conventional compression method with agenerated bit graph of an exemplary embodiment of the present invention.An input video stream is represented as a block diagram 900 divided intoscenes. As described above, a conventional compression method dividesgroups of pictures on a fixed bases (i.e., the same number of picturesper group). A fixed-sized GOP structure is depicted as a block diagram904. As described in connection with FIG. 8, each GOP begins with anI-picture 910-916. The fixed GOP Graph 908 has generated bit peaks thatcoincide with the I-frames 910-916 of each of the fixed-sized GOPs inthe block diagram 904. In addition, the fixed-sized GOP graph 908 alsoincludes peaks coinciding with the scene changes between Scene 1 andScene 2, between Scene 2 and Scene 3, and between Scene 3 and Scene 4.Accordingly, the conventional, fixed-size GOP compression methodgenerates output bit peaks for both I-pictures and scene-changepictures. Therefore, the bit budget for the remaining P-pictures andB-pictures is decreased. The encoding quality of the remainingP-pictures and B-pictures is, therefore, compromised or degraded.

[0070] The variable size GOP graph 906, on the other hand, depictsoutput bit peaks coinciding primarily with scene changes in the inputvideo stream 900. Accordingly, the variable-sized GOP compression methodof an exemplary embodiment of the present invention reduces the numberof output bit peaks in the encoded video stream. More specifically, thevariable-sized GOP compression method minimizes the number of doubleoutput bit peaks. These double peaks are present in the fixed-sized GOPgraph 908 and are created when scene changes occur within a GOP, insteadof coinciding with an I-picture of the GOP. As a result, the overallnumber of output bits generated by the fixed-sized GOP compressionmethod is greater than the overall number of bits generated by thevariable-sized GOP compression method of an exemplary embodiment of thepresent invention.

[0071] Accordingly, the exemplary compression method results in asmaller number of generated compression bits. This advantage providesvarious benefits to an encoding/decoding process. First, the resultant,smaller encoded video stream can be stored and/or transmitted in itssmaller state, thereby conserving system resources. Alternatively, theencoding quality can be improved by re-allocating bits from smaller GOPsto larger GOPs. This is referred to as adaptive bit allocation, becausethe bit allocated to a given GOP can be adapted to the GOP size, whichvaries depending on the scene changes in the input video stream. Thisbenefit is described in more detail in connection with FIG. 10.

[0072] Exemplary Methods for Adaptive Bit Allocation

[0073]FIG. 10a is a flow chart depicting an exemplary method foradaptively allocating bits among variable-sized groups of pictures(GOPs). In an exemplary embodiment of the present invention, bits can beallocated among the variable-sized GOPs. In addition, bits may beallocated among the pictures within a single GOP. These methods may beutilized individually or in concert to maximize the image quality of acompressed video stream and of the pictures within a GOP, whilebenefiting from the enhanced compression processes of exemplaryembodiments of the present invention.

[0074] The method of FIG. 10a begins at start block 1000 and proceeds tostep 1002. At step 1002, the target bit number of a first GOP isdetermined. This step may be performed prior to encoding a GOP. Forexample, after an input stream has been segregated into GOPs, the GOPsmay be stored in a buffer. Because the GOPs in the buffer may havedifferent sizes (i.e., contain variable numbers of pictures), they alsomay have different numbers of bits allocated thereto. The method of FIG.10a provides a means for adaptively allocating bits among GOPs,depending on the relative sizes of the GOPs.

[0075] The method proceeds from step 1002 to step 1004. At step 1004,the number of bits actually generated for the pictures in the GOP isdetermined. The method proceeds from step 1004 to decision block 1006.At decision block 1006, a determination is made as whether the bit sizeof the first GOP is less than the target bit number. If the GOP bit sizeis less than the target bit number, the method branches to step 1010.If, on the other hand, the GOP size is not less than the target bitnumber, the method branches to end block 1016 and terminates.

[0076] At step 1010 the size and target bit number of a second GOP isdetermined. The method proceeds from step 1010 to step 1014. At step1014, bits from the first GOP are allocated to the second GOP. That is,bits that would otherwise be assigned to the first GOP are reassigned tothe second GOP, so that the quality of the second GOP is enhanced. Asdescribed above, the picture quality of the encoded video stream isdirectly related to the bit rate of the encoded video stream.Accordingly, by reallocating bits between GOPs in a video stream, anexemplary embodiment of the present invention can maximize the qualityof the GOPs having bit sizes larger than the target size, whileretaining the picture quality of GOPs having bit sizes less than thetarget bit size. Conventional encoding methods cap the bit size of anygiven GOP at the target bit size. Thus, for GOPs having a larger bitsize, the picture quality is reduced as compared to those GOPs havingsmaller bit sizes.

[0077]FIG. 10b is a flow chart depicting an exemplary method foradaptively allocating bits among pictures within a GOP. In thisembodiment of the present invention, bits can be adaptively allocatedbetween pictures within a GOP. For a GOP containing N-frames, N-1 bitvalues can be allocated to the non-I-picture frames. The bit allocationcan be based on a per-picture target bit size. The bits may be allocatedusing the Root Mean Square (RMS) of the difference between thesuccessive frames. Preferably, the amount of bit allocation for thei^(th) picture in a GOP can be calculated as follows:${T_{p}(i)} = \frac{R \times R\quad M\quad {S(i)}}{\sum\limits_{l = 1}^{N - 1}{R\quad M\quad {S(l)}}}$

[0078] where T_(p) ^((i)) represents the target bit rate for a currentpicture, R represents the target bit rate for the remaining pictures inthe GOP and RMS(i) represents the RMS value of the difference betweeni^(th) picture and i-l^(th) picture in the GOP. After encoding eachpicture in the GOP, the target bit rate for the remaining pictures inthe GOP (R) can be updated by subtracting the number of actuallygenerated bits for each picture. When the number of bits that haveactually been generated for all of the pictures in the GOP is less thanthe target bit rate, then the bits may be made available for allocationto pictures in other GOPs. In this embodiment of the present invention,bits can be allocated on a picture-by-picture basis within a GOP, so asto maximize the picture quality on a picture-by-picture basis.

[0079] Turning now to FIG. 10b, an exemplary method is depicted, whereinbits are adaptively allocated among the pictures in a GOP. The method ofFIG. 10b may be implemented at the time that the picture size (i.e.,number of pictures) for a subject (current) GOP has been defined, forexample, by the Picture Grouping Module 316 described in connection withFIG. 3. The method begins at start block 1050 and proceeds to step 1052.At step 1052, the size of the GOP is determined. This step may beperformed by the Picture Grouping Module 316 or the pictures in the GOPmay simply be re-counted. The method then proceeds to step 1054, whereinthe target bit number for the current GOP is determined. Typically, acompression process is implemented for a particular application whereinan overall bit rate is predetermined. Those skilled in the art willappreciate that this overall bit rate may be used to determine a bitrate on a per-picture basis.

[0080] The method proceeds from step 1054 to step 1056. At step 1056,the Root Mean Square (RMS) of the difference between a current pictureand a previous picture is determined. Initially, the current picturewill be the first picture in the GOP. This step can be performed usingthe formula described above. The method then proceeds to step 1058,wherein the appropriate number of bits is actually allocated to thecurrent picture. The method then proceeds to decision block 1060,wherein a determination is made as to whether all of the pictures in theGOP have been encoded. If a determination is made that all of thepictures in the GOP have been encoded, the method branches to decisionblock 1062. If, on the other hand, a determination is made that all ofthe pictures in the GOP have not been encoded, the method branches tostep 1068.

[0081] At step 1068, the current picture is incremented. That is, thenext picture in the GOP is identified for bit allocation consideration.The method then proceeds to step 1056 and proceeds as described above.Returning now to decision block 1062, a determination is made as towhether the number of bits actually generated by encoding all of thepictures in the GOP is less than the target bit total for all of thepictures in the GOP. If the number of bits actually generated byencoding the pictures in the GOP is not less than the target bit totalfor all of the pictures in the GOP, then the method branches to endblock 1066 and terminates. If, on the other hand, the number of bitsactually allocated to the pictures in the GOP is less than the targetbit total for all of the pictures in the GOP, then the method branchesto step 1064. At step 1064, the remaining bits (not allocated) are madeavailable to the next GOP (or some other subsequently processed GOP) tobe considered for bit allocation. The method proceeds from step 1064 toend block 1066 and terminates.

[0082] Accordingly, the method efficiently allocates bits among pictureswithin a GOP. Where a surplus of bits exists, the method can make thosebits available for subsequent GOPs, for which such a surplus does notexist. Because the GOP size is variable in accordance with exemplaryembodiments of the present invention, this bit allocation methodcapitalizes on bit surpluses that are created by using variable GOPsizes. The described bit allocation methods can be used to significantlyimprove the output quality of an encoding system by efficiently usingbits that might otherwise be imprudently allocated.

[0083] An Exemplary Method of Conditional Replenishment

[0084] Conditional replenishment is a well-known aspect of conventionalcompression methods. Generally conditional replenishment refers to theelimination of redundant video data in a condition wherein video dataremains unchanged between successive pictures in a GOP. Morespecifically, conditional replenishment is a method of “re-using” (i.e.,replenishing) previously encoded video data to populate an area of avideo image that is unchanged from a previous video image. Whenpossible, such replenishment reduces the amount of new video data thatmust be encoded, therefore reducing the output bit rate and increasingoutput bit quality.

[0085] Because successive pictures within an exemplary variable-sizedGOP are typically members of the same scene in an input video stream,the opportunity for conditional replenishment is increased with a givenGOP. Accordingly, the scene-oriented GOP sizing of exemplary embodimentsof the present invention enhance the performance of conventionalreplenishment methods. In addition, because of the similarity betweensuccessive pictures in a given GOP, a novel variation of conditionalreplenishment is applied in an exemplary embodiment of the presentinvention to further enhance video stream compression.

[0086]FIG. 11 is a simplified illustration depicting successive picturesin an exemplary GOP divided into macroblocks. Picture 1100 is dividedinto macroblocks 1102-1114. Likewise, picture 1150 is divided intomacroblocks 1152-1164. Although the image in picture 1100 is differentthan the image in picture 1150, only certain macroblocks are different.Specifically, macroblocks 1102-1110 of picture 1100 are different thanmacroblocks 1152-1160 of picture 1150. On the other hand macroblocks1112-1114 of picture 1100 are identical to macroblocks 1162-1164 ofpicture 1150. Accordingly, picture 1150 may be represented (i.e.,encoded) as being identical to picture 1100, except for changes tomacroblocks 1152-1160.

[0087] When it is determined that a difference exists betweencorresponding coded pixels in the macroblock, the differences can bestored or transmitted in connection with the corresponding picture. If,on the other hand, it is determined that no difference exists betweencorresponding coded pixels, then a flag can be set to indicate (or otherinstruction provided) that the pixel from the previous picture can beused, thereby eliminating a need to store additional information for thesuccessive picture graph.

[0088] In conventional conditional replenishment, the replenishmentcondition is determined by examining the results of the encodingprocess. If the encoding results (quantized DCT coefficients) areexactly same between the macroblocks of current frame and previousframe, replenishment is used. In an exemplary embodiment of the presentinvention, on the other hand, conditional replenishment is performedintelligently by the encoder, based on a calculation of relevantcriteria. Accordingly, if the encoder does not detect a replenishmentcondition, any change detected between corresponding macroblocks insuccessive pictures may be stored or transmitted. On the other hand,when the encoder detects a replenishment condition, then an instructionand/or flag can be used to indicate that the macroblock should bereplenished using the video data from the previous picture.

[0089] Advantageously, conditional replenishment on a macroblock basisenables noise reduction in an encoded video stream. When an encodedvideo stream is decoded, noise is commonly detectable in a displayedvideo stream as a flickering or otherwise perceivable image. Often, suchnoise is more perceivable when it occurs in a background region (i.e., aregion of substantially constant image intensity). In an exemplaryembodiment of the present invention, conditional replenishment isprocessed on a macroblock basis, utilizing 2-part criteria andselectable thresholds for modifying the criterion . As a result, slightdifferences resulting from noise in a particular macroblock can be muted(i.e., filtered). The first criterion can be used to determine thedifferences between an original macroblock and a previous macroblock.This criterion, C1, is given by the expression:${C\quad 1} = \sqrt{\frac{1}{256}{\sum\limits_{i = 1}^{16}{\sum\limits_{j = 1}^{16}\left( {{{org}\left( {i,j} \right)} - {{prev}\left( {i,j} \right)}} \right)^{2}}}}$

[0090] where org(i,j) represents the i^(th) and j^(th) pixel of theoriginal (subject) macroblock and prev(i,j) represents the i^(th) andj^(th) pixel of original macroblock of the previous frame.

[0091] The second criterion, may be used to evaluate the effect of thedecoder, by reference to the original macroblock. The second criterion,C2, is given by the expression:${C\quad 2} = \sqrt{\frac{1}{256}{\sum\limits_{i = 1}^{16}{\sum\limits_{j = 1}^{16}\left( {{{org}\left( {i,j} \right)} - {{coded}\left( {i,j} \right)}} \right)^{2}}}}$

[0092] where org(i,j) represents the i^(th) and j^(th) pixel of theoriginal (subject) macroblock and coded(i,j) represents the i^(th) andj^(th) pixel of the decoded macroblock of the previous frame. Criterion1 is the measurement of similarity of the corresponding macroblocks ofthe current frame and the previous frame. Criterion 2 is for doublecheck of the similarity with the decoded macroblock.

[0093] In addition, threshold values may be selected for the twocriteria, to set the sensitivity of the conditional replenishmentprocess. Alternatively, the threshold may be automatically set such thatit is adaptive to a particular bit rate. The following table provides anexemplary relationship between bit rate and Criterion 1 (C1) thresholdvalues. BIT RATE THRESHOLD 1 greater than 400 k 8 300 k-400 k 11 200k-300 k 13 110 k-200 k 14 less than 100 k 15

[0094] Similarly, the threshold value for Criterion 2 may be setmanually or automatically (an exemplary value for Threshold 2 is 8). Byapplying the 2-part criteria in conjunction with the threshold values,the macroblock-based conditional replenishment method of an exemplaryembodiment of present invention can be used and fine-tuned to reducenoise in a displayed video stream.

[0095]FIG. 12 is a flowchart depicting an exemplary method forperforming conditional replenishment on a macroblock-basis. The methodof FIG. 12 begins at start block 1200 and proceeds to step 1202, whereina first macroblock is compared to a second macroblock. The method thenproceeds to decision block 1204, wherein a determination is made as towhether Criterion 1 (C1) is less than Threshold 1. If at decision block1204, a determination is made that Criterion 1 is not less thanThreshold 1, the method branches to step 1210. At step 1210, a flag canbe set for an instruction providing that the second macroblock should beencoded using the data from the first macroblock, rather than simplyreplenished. The method proceeds from 1210 to end block 1212 andterminates.

[0096] Returning now to decision block 1204, if a determination is madethat the Criterion 1 is less than Threshold 1, the method branches todecision block 1206. At decision block 1206 a determination is made asto whether Criterion 2 is less than Threshold 2. If a determination ismade at decision block 1206 that Criterion 2 is not less than theThreshold 2, the method branches to step 1210 and proceeds as describedabove. If on the other hand, a determination is made at decision block1206 that Criterion 2 is less than Threshold 2, the method branches tostep 1208. At step 1208 the replenishment flag is set for the secondmacroblock. The method proceeds from step 1208 to step 1212 and ends.

[0097] Accordingly, the method of FIG. 12 can be used to utilizeselectable criteria to reduce the encoding, decoding and display ofnoise. The replenishment of an exemplary embodiment of the presentinvention, thus, can be used to filter noise from a displayed videostream. Those skilled in the art will appreciate that various criteriaand/threshold values may be used within the scope of the describedembodiments of the present invention.

[0098] An Exemplary Method for Selecting an Asynchronous SamplingTechnique

[0099] To maximize the quality of compressed video at a low bit rate(e.g., less than 128 kbps), it may be useful to sample the video atoptimum points in time and space. Sampling is roughly defined as thedetermination of which pictures in a video stream will be encoded asI-pictures, B-pictures, and P-pictures. Generally, optimum sampling canbe non-uniform (asynchronous) in one or both of the space and timedomains. Various asynchronous techniques are well known to those skilledin the art and can be used to implement various embodiments of thepresent invention. In an exemplary embodiment of the present invention,an analysis-by-synthesis method of selecting an asynchronous samplingtechnique is provided. In the exemplary analysis-by-synthesis method,separately encoded candidate streams are generated using varioussampling methods. Once generated, the separate candidate streams can becompared on virtually any basis to determine, for example, which has thebest bit rate and signal quality characteristics. The best candidatestream can be selected and designated as the output video stream. Theselected sampling method can be identified to the receiver (decoder)with a small overhead. For example, by using a codebook or dictionary of16 possible sampling techniques, only 4 bits of overhead are needed tosignify the selection. The codebook could be either predetermined orgenerated adaptively (and automatically) over time, based on criteriaincluding extrapolation from a recent history of optimum sampling.

[0100]FIG. 13 is a flowchart depicting an exemplary method forgenerating and selecting between two sampling methods. Those skilled inthe art will appreciate that any number of sampling methods could beused and evaluated within the scope of the present invention. It alsowill be appreciated that the generation of multiple candidate streamscreates overhead as described above, and that the exemplary samplingselection method may be more easily applied to one-way communications(e.g., video streaming), than to two-way communications (videoteleconferencing).

[0101] The method of FIG. 13 begins at start block 1300 and proceeds tostep 1302. At step 1302, a first input video stream is encoded using afirst sampling technique. The method then proceeds to step 1304. At step1304, a second input stream is encoded using a second samplingtechnique. The method then proceeds to step 1306, wherein the encodedcandidate video streams are compared. This comparison could be based onvarious characteristics of the candidate video streams. However, it ispreferable that the characteristics are perceptually meaningfulcharacteristics. An exemplary characteristic is thesignal-to-noise-ratio of each encoded candidate video stream, ascompared to the original uncompressed signal.

[0102] The method proceeds from step 1306 to decision block 1308. Atdecision block 1308, a determination is made as to whether thesignal-to-noise-ratio (SNR) for the first stream is higher than the SNRfor the second stream. If the SNR for the first stream is better thanthe SNR for the second stream, then the method branches to step 1310. Atstep 1310, the first stream is output. Returning to decision block 1308,if the SNR for the second stream is better than the SNR for the firststream, then the method branches to step 1312. At step 1312, the secondstream is output. Accordingly, the encoded candidate streams having beenencoded using different sampling techniques are compared and the beststream is output, for example, from an encoding system, together withthe overhead information that signifies the corresponding samplingmethod.

[0103] Although the present invention has been described in connectionwith various exemplary embodiments, those of ordinary skill in the artwill understand that many modifications can be made thereto within thescope of the claims that follow. Accordingly, it is not intended thatthe scope of the invention in any way be limited by the abovedescription, but instead be determined entirely by reference to theclaims that follow.

What is claimed is:
 1. A method for processing an input video stream comprising a series of pictures, the method comprising the steps of: detecting a first scene change between a first scene in the input video stream and a second scene in the input video stream; and classifying a first picture in the input video stream as a first intra-picture (I-picture), wherein the first picture coincides with the first scene change.
 2. The method of claim 1, further comprising the steps of: determining whether there are a predetermined number of pictures between the first intra-picture and a second scene change; classifying a second picture in the input video stream as a second intra-picture, in response to a determination that the predetermined number of pictures exist between the first intra-picture and the second scene change, wherein the second picture coincides with the predetermined number of pictures.
 3. The method of claim 2, further comprising the steps of: classifying a third picture in the input video stream as a third intra-picture, wherein the third intra-picture coincides with the second scene change.
 4. The method of claim 1, wherein the step of determining a scene change, comprises the step of determining whether a change in a motion vector in the first picture exceeds a predetermined motion vector threshold.
 5. A system for organizing a series of pictures in an input video stream into at least one group of pictures (GOP), comprising: a scene change detector operative to detect a scene change in the series of pictures and to classify a first picture following the scene change as a first intra-picture (I-picture) and to classify at least one other picture following the scene change as a predicted picture (P-picture) and to classify at least one second picture as a bi-directionally predicted picture (B-picture); and a bit allocation module operative to determine whether a first GOP uses less than a predetermined target number of bits and further operative to allocate an unneeded bit to a second GOP in response to a determination that the first GOP uses less than the predetermined target number of bits.
 6. The system of claim 5, further comprising a bit rate controller operative to compare a previous macroblock of a first picture to a subsequent macroblock in a second picture and to determine that the subsequent macroblock is different than the previous macroblock.
 7. The system of claim 6, wherein the bit rate controller is further operative to determine a first criterion characterizing the relationship between the previous macroblock and the subsequent macroblock and to compare the first criterion to a first threshold value.
 8. The system of claim 7, further comprising a decoder operative to represent the subsequent macroblock in an output video stream, wherein the bit rate controller is further operative to instruct the decoder to represent the subsequent macroblock in an identical form as the previous macroblock, in response to a determination that the first criterion is less than the first threshold value.
 9. The system of claim 7, wherein the bit rate controller is further operative to instruct the decoder to represent the subsequent macroblock in a non-identical form as the previous macroblock, in response to a determination that the first criterion is less than the first threshold value.
 10. An encoding system for compressing an input video stream having a series of pictures, the encoding system comprising: a video encoder operative to receive the input video stream and an input control stream and to generate an encoded video stream; a picture grouping module operative to receive the input video stream and to generate at least one adaptive picture grouping for the pictures in the encoded video stream; a bit allocation module operative to receive the input video stream and to adaptively allocate bits among the series of pictures and to adaptively allocate bits among the adaptive picture groupings.
 11. The encoding system of claim 10, wherein the adaptive grouping comprises classifying the pictures in the input video stream as intra-pictures (I-pictures), predicted-pictures (P-pictures), and bidirectionally predicted pictures (B-pictures)
 12. The encoding system of claim 10, further comprising a bit rate controller operative to compare a previous macroblock of a first picture to a subsequent macroblock in a second picture and to determine that the subsequent macroblock is different than the previous macroblock.
 13. The encoding system of claim 12, wherein the bit rate controller is further operative to determine a first criterion characterizing the relationship between the previous macroblock and the subsequent macroblock and to compare the first criterion to a first threshold value and to instruct a decoder to represent the subsequent macroblock in an identical form as the previous macroblock, in response to a determination that the first criterion is less than the first threshold value.
 14. A method for selecting a video stream sampling technique, the method comprising the steps of: encoding an input video stream using a first sampling technique to generate a first encoded video stream; encoding an input video stream using a second sampling technique to generate a second encoded video stream; comparing at least one characteristic of the first encoded video stream to at least one characteristic of the second encoded video stream; selecting the first encoded video stream as an output encoded video stream, in response to a determination that the at least one characteristic of the first encoded video stream is preferable to the at least one characteristic of the second encoded video stream; and selecting the second encoded video stream as an output encoded video stream, in response to a determination that the at least one characteristic of the second encoded video stream is preferable to the at least one characteristic of the first encoded video stream.
 15. A method for adaptively grouping pictures in an input video stream, the method comprising: creating a first group of pictures (GOP); classifying a first picture in the input video stream as an intra-picture (I-picture) and adding the first picture to the first GOP; retrieving a second picture from the input video stream making a determination as to whether a second picture in the input video stream coincides with a scene change; classifying the second picture as an I-picture, in response to a determination that the second picture in the input video stream coincides with a scene change; and classifying the second picture as a non-I-picture and adding the second picture to the first GOP, in response to a determination that the second picture in the input video stream does not coincide with a scene change.
 16. The method of claim 15, further comprising the step of creating a second GOP and adding the second picture to the second GOP, in response to a determination that the second picture in the input video stream coincides with a scene change.
 17. The method of claim 16, wherein the first GOP and the second GOP can contain different numbers of pictures.
 18. The method of claim 15, wherein the non-I-picture is a predicted picture (P-picture).
 19. The method of claim 15, wherein the non-I-picture is a bidirectionally predicted picture (B-picture).
 20. The method of claim 15, wherein the determination that the second picture in the input video stream coincides with a scene change, comprises a making a determination that a motion vector corresponding to the second picture has been changed. 